Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data
نویسندگان
چکیده
We present a method for introducing text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another text). Thus, model can learn from both unlabeled and labeled data, especially when data is abundant. Beyond this, we denoising build robust that deal with normal noisy data. Our system sets new state-of-the-arts on MuST-C En-De, En-Fr, LibriSpeech En-Fr tasks.
منابع مشابه
Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation
Current speech translation systems integrate (loosely or closely) two main modules: source language speech recognition (ASR) and source-to-target text translation (MT). In these approaches, source language text transcript (as a sequence or as a graph) appears as mandatory to produce a text hypothesis in the target language. In the meantime, deep neural networks have yielded breakthroughs in dif...
متن کاملEnd-to-End Automatic Speech Translation of Audiobooks
We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task. Previous works investigated the extreme case where source language transcription is not available during learning nor decoding, but we also study a midway case where source language transcription is available at training time only. In this case, a single model is trained to decod...
متن کاملImproving End-to-End Speech Recognition with Policy Learning
Connectionist temporal classification (CTC) is widely used for maximum likelihood learning in end-to-end speech recognition models. However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e.g., word error rate (WER). This results in a mismatch between the objective function and metric during training. We show that the ...
متن کاملEnd-to-End Evaluation in JANUS: A Speech-to-speech Translation System
JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneousconversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end-to-end evaluations the evaluation of the translation capabilities of the system as a whole. The main goal of ou...
متن کاملAn Experimental Methodology for an End-to-End Evaluation in Speech-to-Speech Translation
This paper describes the evaluation methodology used to evaluate the TC-STAR speech-to-speech translation (SST) system and their results from the third year of the project. It follows the results presented in (Hamon et al., 2007), dealing with the first end-to-end evaluation of the project. In this paper, we try to experiment with the methodology and the protocol during the second end-to-end ev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26637